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IMPROVEMENT  OF  KERNEL  ESTIMATORS  OF  THE  FAILURE 
RATE  FUNCTION  USING  THE  GENERALIZED  JACKKNIFE 

by 

Nozer  D.  Singpurwalla 
Man-Yuen  Wong 

1 .  Introduction  and  Summary 

The  failure  rate  function  is  one  of  the  most  important  parameters 
in  reliability  theory.  Of  the  several  methods  for  estimating  the  fail¬ 
ure  rate  that  have  been  proposed,  those  based  upon  weighting  functions 
or  "kernels"  are  quite  common.  These  kernels  have,  among  other  things, 
the  following  two  features: 

(i)  they  are  nonnegative,  and 
(ii)  they  arc  absolutely  integrable  in  (-00,00) . 
such  kernels  are  called  L1  kernels.  Watson  and  I.eadbetter  (1964a, 
1964b)  show  that  the  kernel  estimators  of  the  failure  rate  at  some  point 
Xq  are  usually  biased;  the  bias  converges  to  zero  as  the  sample  size 

increases  to  infinity. 

Our  motivation  for  undertaking  the  research  reported  here  is 
to  explore  ways  in  which  we  can  reduce  the  bias,  and  improve  upon  the 
rate  of  convergence  of  the  mean  square  error  (MSE).  Our  conclusion  is 
that  if  the  failure  rate  function  is  sufficiently  "smooth,"  and  if  the 
kernel  is  suitably  chosen,  then  the  bias  contribution  to  the  asymptotic 
MSE  can  in  principle  be  eliminated  to  any  desired  order,  and  that  the 
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rate  of  convergence  of  the  asymptotic  MSE  can  be  brought  as  close  to 

as  is  desired.  However,  in  order  to  achieve  these  properties,  the 
nonnegativity  and/or  the  absolute  integrabtlity  restrictions  on  the 
kernels  will  have  to  be  relaxed. 

An  interesting  (though  not  surprising)  aspect  of  our  work  is 
that  the  desired  kernels  can  be  constructed  by  the  "generalized  jack¬ 
knife"  (GJ)  method  of  Gray  and  Schucany  [cf.  Schucany  and  Sommers 
(1977)].  Viewed  alternatively,  this  means  that  if  we  use  the  GJ  on  two 
kernel  estimators  of  the  failure  rate,  with  each  estimator  based  upon  a 

nonnegative  kernel  then,  this  is  equivalent  to  directly  producing  a 

kernel  estimator  using  a  kernel  not  restricted  to  be  nonnegative.  Fi¬ 
nally,  we  conjecture  that  using  the  GJ  indefinitely  is  equivalent  to 
producing  a  kernel  estimator  using  a  kernel  which  does  not  satisfy  both 
(i)  and  (ii)  above.  Since  kernel  estimators  are  also  used  in  density 
estimation,  and  the  estimation  of  the  power  spectrum,  the  above  results 
should  be  of  a  wider  interest. 

The  organization  of  our  paper  is  as  follows.  In  Section  2  we 
present  a  kernel  estimator  of  the  failure  rate  at  xQ  ,  h(n,x0)  ,  intro¬ 
duce  some  notation  and  terminology,  and  discuss  some  general  properties 
of  h ( n , Xq)  . 

In  Section  3  we  first  prove  a  theorem  (Theorem  3.1)  which  gives 
us  the  asymptotic  bias,  and  enables  us  to  obtain  the  asymptotic  MSE  of 
h(n,Xg)  .  An  important  result  of  Section  3  is  a  "saturation  theorem" 

(Theorem  3.2).  This  theorem  establishes  the  fact  that  if  the  kernel 
used  to  obtain  h(n,XQ)  is  nonnegative  and  absolutely  integrable,  then 

there  is  a  limit  beyond  which  the  rate  of  convergence  of  the  bias  and 
the  MSE  doe3  not  increase,  even  if  greater  smoothness  of  the  failure 

rate  is  assumed.  We  show  that  the  best  possible  rate  of  convergence  of 

the  MSE  using  a  nonnegative  kernel  is  of  the  order  n  [Equation 

(3.8)].  Kernels  which  are  not  restricted  to  be  nonnegative  are  next 


vs 
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introduced  [Equation  (3.9)].  The  rate  of  convergence  of  the  bias  of 
h(n,x^)  using  such  kernels  is  derived  (Theorem  3.3).  We  conclude 

Section  3  by  showing  that  the  best  possible  rate  of  convergence  of  the 

2m 

MSE  using  kernels  in  A  [cf.  Equation  (3.9)]  is  of  the  order  n 


In  Section  4,  we  introduce  the  generalized  jackknife  procedure, 
and  review  briefly  those  properties  of  the  procedure  that  are  of  interest 
to  us.  We  show  how  the  GJ  can  be  used  to  construct  the  suitable  kernels 
which  are  not  restricted  to  be  nonnegative.  In  particular,  we  show  how 
the  GJ  when  used  once,  can,  under  certain  conditions,  produce  an  estimator 
for  which  the  best  possible  rate  of  convergence  of  the  MSE  is  of  the 
-12/13 

order  n  .  This  is  a  clear  improvement  over  the  original  kernel 

-4/5 

estimators  whose  rate  of  convergence  of  the  MSE  is  of  the  order  n 
We  conclude  Section  4  by  giving  some  examples. 

In  Section  5  we  concern  ourselves  with  using  the  generalized 
jackknife  procedure  over  and  over  again,  an  indefinite  number  of  times. 

We  show  how  this  procedure  enables  us  to  produce  estimators  for  which 

the  rate  of  convergence  of  the  MSE  can  be  brought  as  close  to  n  ^  as 
is  desired.  We  give  a  few  examples  to  illustrate  the  effect  of  a  suc¬ 
cessive  use  of  the  GJ  procedure.  These  examples  motivate  us  to  conjec¬ 
ture  that  the  effect  of  an  indefinite  use  of  the  GJ  effectively  leads  us 
to  kernels  which  are  not  absolutely  integrable.  This  topic  is  explored 
in  greater  detail  in  another  paper  [Singpurwalla  and  Wong  (1980)]. 


2.  Kernel  Estimates  of  the  Failure  Rate 


Suppose  that  the  time  to  failure  of  a  device  is  a  nonnegative 
random  variable  X  with  an  absolutely  continuous  distribution  function 
F  and  a  probability  density  function  f  .  The  failure  rate  at  point 
Xq  ,  h(xQ)  ,  for  F(xQ)  ^  1  ,  is  defined  as 


h(x0} 


f(x0} 

1  -  F(xq)  * 


(2.1) 


Given  an  ordered  sample  of  n  lifetimes  X^^  ~  ^(2)  =  1 

X,  .  from  F  ,  our  objective  is  to  estimate  h(xn)  under  some  very 
(n)  U 


general  assumptions  on  F  and  f 


We  shall  consider  the  following  kernel  type  estimators  originally 
proposed  by  Watson  and  Leadbetter  (1964a,  1964b' ,  and  considered  more 
recently  by  Rice  and  Rosenblatt  (1976)  and  Sethuraman  and  Singpurwalla 
(1978). 


Definition  2.1:  A  kernel  estimate,  h(n,Xg)  ,  of  the  failure 
rate  h(x)  at  the  point  x^  ,  is  defined  as 


h(n,x0) 


j-j.  b(n)  \  b(n)  /  n-j+1  ’ 


(2.2) 


where  K(*)  is  a  Borel  function  called  the  kernel  of  the  estimator, 
and  b(n)  is  a  sequence  of  positive  functions  of  n  such  that 


(i)  lim  b(n)  =  0  ,  and 
n-»» 

(2.3) 

(ii)  lim  nb(n)  =  »  . 
n-*00 


A  motivation  for  these  estimates  is  that  if  X,,.  <  X,„.  <  ...  < 

(1)  -  (2)  -- 

X^j  come  from  an  arbitrary  distribution  F  ,  then  the  maximum  likeli¬ 
hood  estimate  of  f  is  a  discrete  distribution  with  a  probability  mass 
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of  1/n  at  X^  [Grenander  (1956)];  an  estimate  of  the  failure  rate 

at  X^  is  therefore  l/(n-j)  .  In  order  to  avoid  the  infinite  at 

j  =  n  ,  we  change  our  estimate  of  the  failure  rate  at  X^^  to 

l/(n-j+l)  ,  and  smear  this  quantity  out,  continuously  smoothing  accord¬ 
ing  to  the  kernel  K(*)  . 

We  restrict  our  consideration  to  kernels  which  satisfy  the  fol¬ 
lowing  conditions: 

'  (i)  sup  |K(x)  |  <  00  , 

X 

(ii)  K(-x)  =  K(x)  ,  (2.4a) 

(iii)  /  K(x)dx  =  1  ; 

(iv)  /  |K(x)  |dx  <  00  , 

(2.4b) 

(v)  lim  |xK(x)  j  «=  0  . 

|xh 

When  (iv)  above  holds,  we  shall  say  that  Ke  L1  . 

Examples  of  some  kernels  which  satisfy  (2.4a)  and  (2.4b)  are  the 
following: 


K(x) 

1  *  lf 

l*l<i 

(rectangular) 

0  ,  if 

l«l>i 

K(x) 

Jl  -  |x|  ,  if 
(O  ,  if 

1*1  <1 

W>i 

(triangular) 

K(x) 

-  <2.)-1/2  e-2/2 

(Weierstrass) 

K(x) 

.  i.'lxl 

(Picard) 

K(x) 

- wr1 

(Cauchy) 
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We  remark  that  all  the  kernels  given  above  are  nonnegative. 

We  shall  find  it  convenient  to  introduce  the  following  definitions. 


Definition  2.2:  A  sequence  of  functions  {6n(x) }  is  called  a 

& -function  sequence  if  6  (x)  can  be  written  in  the  form 

n  i 

6n(x)  =  bTio  K(bfe)  ’ 

where  K(»)  is  a  kernel  which  satisfies  (2.4a)  and  (2.4b),  and  b(n) 
is  a  sequence  of  nonnegative  decreasing  functions  which  satisfy  (2.3). 


Thus  (2.2)  can  be  written  as 


h(n,xQ) 


i-1  ' 


(2.5) 


Definition  2.3:  For  a  given  6-function  sequence  6^(*)  »  a 
distribution  function  F  is  said  to  be  in  the  class  Cj  ,  if  for  any 
fixed  Xq  ,  and  for  any  fixed  A>0  ,  there  exists  a  >  0  such  that 


iv^v  I 

1  -  F(x)  '  A  ’ 

for  all  |x-Xq|  >  A  and  for  all  sufficiently  large  r  . 


(2.6) 


Note  that  if  the  kernel  K  is  bandlimited,  that  is,  if  K(x)  =  0 
for  all  |x  |  >  C  ,  for  some  finite  real  value  C>0  ,  then  Cg  is  the 
class  of  all  distribution  functions. 

Before  describing  the  properties  of  the  estimator  h(n,Xg)  ,  we 

should  indicate  a  few  abbreviations.  If  a^  and  b^  are  two  sequences, 

then  "a^~  b^"  is  read  aR  is  asymptotically  equivalent  to  b^  ,  and 

means  that  the  ratio  of  a  to  b  has  limit  one.  The  notation 

n  n 
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"a  =  o(b  )"  means  that  the  ratio  of  a  to  b  has  limit  0,  and 
n  n  n  n 

"a  =  0(b  )"  means  that  the  absolute  value  of  the  ratio  is  bounded  in 
n  n 

the  limit.  The  terms  o(b  )  and  0(b  )  are  frequently  used  to  repre- 

n  n 

sent  some  unknown  function  of  n  which  has  the  appropriate  property. 

The  following  theorem  establishes  the  asymptotic  unbiasedness 
and  the  consistency  of  the  estimator  (2.2). 


Theorem  2.1  [Watson  and  Leadbetter  (1964a)]:  Let  (6  (x)  }  be 

-  n 

a  6-function  sequence  and  let  F(x)  be  a  distribution  function  in  Cg  . 
If  h(x)  is  continuous  at  ,  and  if  F(x^)  < 1  »  then  h(n,Xg)  given 

by  (2.2)  is  an  asymptotically  unbiased  estimator  of  h(x^)  . 


Furthermore,  if  a  =  16  (x)dx < 00  and  a  =  o(n)  ,  then 
n  n  n 


h(n,Xg)  is  consistent  with  an  asymptotic  variance 


Var[h(n,xQ)] 

That  is,  the  variance  of  h(n,XQ) 
which  an/n  goes  to  zero. 


°n  h(x0) 
n  1-F(x0) 


(2.7) 


goes  to  zero  at  the  same  rate  at 
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Rates  of  Convergence  of  the  Bias  and 
the  Wean  Square  Errors 


In  order  to  compare  estimates  of  h(x^)  using  different  kernels, 

we  will  have  to  study  the  rates  of  convergence  of  their  bias  and  their 
mean  square  errors.  In  general,  the  rate  of  decrease  of  the  bias  and 
the  MSE  depends  on  the  particular  kernel  (or  the  6-function  sequence) 
that  is  chosen.  Furthermore,  for  a  given  kernel,  the  rate  of  conver¬ 
gence  improves  with  the  smoothness  of  the  failure  rate  function  h  . 
However,  for  some  kernels  (in  particular,  the  nonnegative  kernels), 
there  may  be  a  limit  beyond  which  the  rates  of  convergence  do  not  in¬ 
crease,  even  if  greater  smoothness  of  h  is  assumed.  This  phenomenon 
is  called  "saturation"  [see  Shapiro  (1969)]. 


To  see  this,  we  shall  first  give  the  following  theorem  on  the 
asymptotic  bias  of  h(n,Xg)  . 


Theorem  3.1:  Let  x^>x2 . xn  be  a  rar>d°m  sample  of  lifetimes 

from  an  absolutely  continuous  distribution  function  F  ,  and  probability 
density  function  f  ;  xQ  is  a  continuity  point  of  the  failure  rate  h 

of  F  and  ’’(xq)  < 1  •  Let  h(n,Xg)  ,  an  estimate  of  h(xg)  >  be  given 

by  (2.5).  If  F  e  C.£  ,  and  if,  for  some  positive  integer  m>.l  ,  h  e  Cm 

(i.e. ,  h  is  m  times  continuously  differentiable),  11m  nbm(n)  =  00  , 

n-x» 

and  the  kernel  K  is  such  that  x^KEL^  ,  then  the  bias  of  h(n,Xg)  is 
given  by 


h(j)(xn) 


Bias[h(n,xn)]  =  [  b^(n)  - —  /  x^K(x)dx  +  o(bm(n))  .  (3.1) 

j-1  J ' 
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Proof: 


n  <$  (X,  .-x_) 

E[h(n,x  )]  =  E  l  - 

j-1  J 


l  f  r~rrr  5n^u“xn^  fx  (u>du 

n-1+1  n  0  X,.v 


n 

l 

j=l 

n 


(j) 


l .  1  ^FT  6n(u‘x0)  (i-i)!(n~i)r  (F<u))j~1f(u)(l-F(u))n-jdu 


(J-l)Kn-j) 


I  l  (j"i)  *n(u-x0)h(u)(F(u))^  1(l-F(u))n  ^+1du 

In+1  . 

^  (P<u))J"X(l-F(U)) 


n"j+1  -  Fn(u) 


du 


=  /  h(u)  6n(u-xQ)  [1-F  (u)  ]du 


=»  I  h(u)  <5  (u-x»)du  -  /  h(u)  6  (u-x~)F  (u)du  . 
n  u  n  u 


(3.2) 


Consider  the  first  term  of  the  above  expression;  note  that  it  can  also  be 
written  as 


f  h(u)  bFT  K  (bF?)du  =  f  h(xo  +  vb(n))  K(v)dv  • 

Using  a  Taylor's  series  expansion  about  x^  ,  we  can  write  the  first  term 
of  (3.2)  as 


since  hE  C*  and  xmK  e  LX  . 


we 


In  order  to  see  that  the  second  term  of  (3.2)  is  o(bm(n))  , 
choose  a  X>0  so  that  F(xq+A)  <  1  ,  and  h(u)  is  bounded  in 
|u-Xg|£A  .  Then 

|/h(u)<Sn(u-x0)Fn(u)du|  <  /h(u)|  6n(u-xQ)  |  Fn(u)du 
|6  (u-xn)  | 

=  /  f(u)  — ~  p7~-v —  Fn(u)du  +  /  h(u)  |5  (u-xn)  |  Fn(u)du 

|u-x0|>X  l-F(u)  |u-x0|<X  "  0 

<  CA  Jq  FndF  +  const  •  Fn(xQ+A)  [by  (2.6)] 

G 

=  — rr  +  const  •  Fn(xrt+X)  . 
n+1  0 

Using  the  above  inequality  we  observe  that 

1  n  G\  Fn(x  +\) 

-  |/h(u)S  (u-x_)F  (u)du  |  <  -  +  const  - -  -+•  0  , 

bm(n)  n  0  “  (n+l)bm(n)  bm(n) 

as  n  vou  ;  thus 

/h(u)6  (u-x_)Fn(u)du  =  o(bm(n))  . 
n  0 

The  statement  of  the  theorem  follows  if  we  combine  our  results  on  the 
two  terms  of  (3.2).  // 


Since  K  is  assumed  to  be  an  even  function  [i.e.,  K(-x)  =  K(x)], 

the  bias  can  be  written  (more  precisely)  as 


Blas[h(n,Xg) ] 


l  b2k(n) 
k=l 


(2k)! 


I  x2kK(x)dx  +  o(b2t(m)) 


(3.3) 


where  m =  2t  for  some  positive  integer  t  . 


Note  that  Theorem  3.1  is  valid  for  all  kernels  K  ,  nonnegative 
or  otherwise,  which  satisfy  the  conditions  of  the  theorem.  In  order  to 
discuss  a  saturation  result  for  h(n,Xg)  when  K  is  nonnegative,  we 

look  at  a  special  case  of  (3.1),  when  m=2  ;  that  is,  when 


11 
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(i)  x2KeL1  , 

(il)  Ho  nb2(n)  =  <»  t  and  (3*4) 

n-*°° 

(Hi)  he  C2  , 

we  have  as  a  corollary  to  Theorem  3.1 

Theorem  3.2  (pointwlse  saturation  theorem):  Let  h(n,x^)  be 
the  failure  rate  estimation  of  h(x^)  given  by  (2.5);  x^  is  a  con¬ 
tinuity  point  of  the  failure  rate  h  and  F(x^)  < 1  .  Under  conditions 
(3-4)  and  if  F  e  ,  then 

Bias [h(n,x~) ]  h"(xn)  _ 

lim  - = - —  =  — — —  /x  K(x)dx  .  (3.5) 

n-*°°  b  (n)  ^ ' 

Proof:  Follows  from  3.1  and  the  fact  that  K  is  even 
/xK(x)  =  0  .  // 

For  those  kernels  K  which  are  nonnegative  and  satisfy  the  con¬ 
ditions  of  Theorem  3.2  (e.g.,  when  K  is  the  rectangular,  triangular, 

2 

Weiorstrass,  or  Picard),  /x  K(x)  is  always  nonzero.  If  h'^x^)  £  0  , 

then  the  rate  of  convergence  of  the  bias,  according  to  (3.5),  is  no 

faster  than  b2(n)  .  Thus  we  conclude  that  the  best  possible  rate  of 

2 

decrease  of  the  bias  of  h(n,Xg)  with  nonnegative  kernels  is  b  (n)  . 

We  now  consider  the  mean  square  error  of  the  estimator  h(n,Xg) 
under  the  assumptions  of  Theorem  3.2.  Since 

2  h"(x0)  , 

Bias[h(n,xQ)3  —  b  (n)  — —  /x  mx) dx  , 
and  the  asymptotic  variance  of  h(n,x^)  given  by  (2.7)  is 

/62(x)dx  h(xft) 

Var{h(n,x0>] - —  £1^  • 
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The  above  results  motivate  us  to  consider  kernels  which  are  not 

restricted  to  be  nonnegative,  and  which  are  not  .  In  Singpurwalla 

and  Wong  (1980)  we  consider  a  kernel  which  is  not  L*  and  which  can 
take  negative  values.  However,  we  shall  first  show  here  (Theorem  3.3) 
that  we  can  obtain  an  improvement  in  the  rate  of  convergence  of  the  MSE 
of  h(n,Xg)  when  K  is  not  restricted  to  be  nonnegative,  but  is  still 

an  kernel;  although  one  might  think  that  nonnegative  kernels  might 

provide  the  best  estimates  since  the  failure  rates  are  nonnegative.  It 
is  to  be  expected  that  the  price  that  we  will  have  to  pay  for  obtaining 
faster  rates  of  convergence  of  the  MSE  using  such  kernels  is  that  the 
resulting  estimate  of  the  failure  rate  could  be  negative  at  some  points. 

Let  A  (where  m  >  2  is  a  positive  integer)  be  the  class  of  all 

real  valued  Borel  measurable  bounded  functions  K  (kernels),  which  sat¬ 
isfy  conditions  (2.4)  and  the  following  condition: 

/x^K(x)dx  =  0  ,  for  j=l,2, . . . ,m-l  .  (3.9) 

All  kernels  (nonnegative  or  otherwise)  which  satisfy  (2.4)  also 
satisfy  (3.9)  for  m=2;  thus  A^  is  the  class  of  all  kernels  which 

satisfy  (2.4).  For  m>3  ,  the  class  A  contains  no  nonnegative  func- 

tions  and  its  elements  will  therefore  lead  to  possibly  negative  failure 

rate  estimates  if  Ke  A  is  used  in  estimation.  Since  K  is  an  even 

m 

function,  Ke^2k-1  *mPlie8  that  KeA^  *  ^or  an^  ^“2,3,...  ;  thus 
the  class  A^  need  only  be  defined  for  an  even  integer  m  > 1  . 

Theorem  3.3:  Let  X^,X2,-..,Xn  be  a  random  sample  of  lifetimes 

from  an  absolutely  continuous  distribution  function  F  ,  and  probability 
density  function  f  ;  Xq  is  a  continuity  point  of  the  failure  rate  h 

of  F  and  F(xq)  >  1  .  Let  h(n,XQ)  be  an  estimate  of  h(xg)  »  given 

by  (2.5).  If  F  e  C ^  ,  and  if,  for  some  positive  integer  m£l  ,  the 


following  conditions  hold: 
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'  (i) 

i 

K€  A  , 

m 

(ii) 

xmK  e  L1  , 

(3.10a) 

|(lii) 

/xmK(x)  t  0  , 

(iv) 

lim  nbm(n)  =  00  , 
n-x» 

(3.10b) 

(v) 

heC  ; 

then 


11m  -  Bias [h(n,xA) ]  =  - j - f  dx 

.  t_m/  \  U  m! 

n-*00  b  (n) 


Proof:  Analogous  to  that  of  Theorem  3.2.  but  in  the  light  of 
conditions  (3.10).  // 


Thus,  the  bias  of  the  estimator  h(n,Xg)  based  on  a  kernel  K 
which  satisfies  conditions  (3.10a)  cannot  decrease  any  faster  than 
bm(n)  ;  in  fact. 


Bias[h(n,x0)] 


bm(n) 


hm(x0) 

m! 


/  xmK(x)dx  . 


(3.11) 


Since  the  asymptotic  variance  of  h(n,Xg)  is  still  of  the  form 


Var[h(n,xQ)] 


/6  (x)dx  h(xn) 
n  u 


1-F(x0)  * 


the  asymptotic  MSE  of  h(n,x^)  with  K£Am  is 

1  h<Xn>  9  T  .  b"<V  m  T 

MSE[h(n,xQ)]  ~  I^F(x  ~)  ^~K  lv)dv  +  |^b  — /xmK(x)dxJ  .  (3.12) 


Given  h  and  K  ,  we  can,  for  a  fixed  value  of  n  ,  find  that  b(n) 
which  minimizes  the  asymptotic  MSE  of  h(n,Xg)  .  This  value  is 
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h<xQ) 


b(n) 


PRi3  /K  <v>dv 


2m+l 


2m 


- - - m - To  " 


2m+l 


(3.13) 


For  this  value  of  b(n)  ,  the  optimal  value  of  the  MSE  is 


MSEl'(n,x#)]opt 


2m 


~ ( 2m+l ) 


/K' 


'(v)dv  h(Xo)  2m+1  [~/xmK(x)dx  1 

2m  1-F(xq)J  L  m!  h  (Vj 


2m+l  - 


2m 


2m+l 


(3.14) 


Based  upon  the  above  results  we  state  that  if  b(n)  ®  0(n 

2m 


2m+l 


)  ,  then 
2m 


MSE[h(n,x0)]opt  -  0(n  2m+1)  ;  i.e.,  MSE[h(n,x0)  ]  -  0  as  n  2nrH 


We  contrast  this  result  to  our  previous  result  using  nonnegative  kernels 

2m 


for  which  MSE[h(n,Xg)  ]  •* 0  as  n  ,  which  is  slower  than  n  2m+1  for 


m  >  3  when  using  kernels  which  are  not  restricted  to  be  nonnegative.  We 
have  proved  the  following  important  result. 


Theorem  3.4:  Under  the  conditions  of  Theorem  3.3,  the  asymptoti¬ 
cally  optimal  rate  of  convergence  of  the  MSE  of  the  failure  rate  esti¬ 


mator  h(n,Xp)  is  of  the  order  n 


2m 

2m+l 
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4.  The  Generalized  Jackknife  and  Kernels  in  A 

in 


Finding  kernels  K e  is  quite  straightforward,  since  many  of 

the  continuous  density  functions  which  are  symmetric  about  0  satisfy 
the  conditions  of  Theorem  3.2.  The  best  possible  rate  of  convergence 

of  the  MSF.  of  h(n,Xg)  based  on  these  kernels  is  n  .  However,  as 


we  have  seen  before,  we  can  obtain  estimators  with  a  faster  rate  of  con¬ 
vergence  in  bias  and  MSE  by  using  kernels  in  A^  ,  m^  3  .  In  this  event 

the  bias  decreases  in  the  order  of  bm(n)  and  the  optimal  MSE  tends  to 
2m 


2|Q  |  1 

zero  as  n  .  Our  next  objective  is  to  discuss  a  procedure  which 

leads  us  to  such  kernels. 


In  this  section  we  shall  discuss  the  generalized  jackknife  method 
[Schucany,  Gray,  and  Owen  (1971)]  of  combining  estimators,  and  show  that 
this  procedure,  which  is  typically  used  to  remove  the  bias  and  reduce 
the  MSE,  essentially  leads  us  to  kernels  K  which  belong  to  A  ,  m  > 3 


We  shall  first  give  a  brief  introduction  to  the  generalized  jack¬ 
knife  method  of  combining  estimators. 


Definition  4.1  (Gray  and  Schuc3ny  (1972)]:  Let  0^  and  0^  be 

two  estimators  of  0  .  Then  for  any  real  number  Ri*l  ,  we  define  the 
puneralized  jackknife  0  of  0  by 


0 


k  trivial,  though  important,  property  of  the  estimator  0  is 
given  by  the  following  theorem. 


Theorem  4.1 
J-1.2  ,  b2(n,0)  *  0 
estimator  of  0  . 


[Gray  and  Schucany  (1972)]:  If 

b^n.S) 

*  and  R  "  MM)  *  1  •  then 


E^)  -  O  +  bjOi.O) 
8  is  an  unbiased 


J*  vs 
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In  particular.  If  the  biases  of  0^  and  02  have  the  forms 
bj(n,0)  =  fj(n)b(0)  ,  j=l,2  ,  then,  under  the  conditions  of  Theorem  4.1, 
the  estimator  0  is  of  the  form 


and  E(0)  =  0  . 

The  idea  of  Definition  4.1  and  Theorem  4.1  can  be  generalized  to 
include  three  or  more  estimators. 


A  A 

When  the  bias  in  the  estimators  ^,...,0^+1  can  be  written 

as  the  product  of  a  function  of  n  and  a  function  of  0  ,  we  have  the 
following  result. 
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Theorem  4.2;  If  E(0  )  =  0  +  £  f  (n)b  (0)  ,  j-1,2 . k+1  , 

J  i=l 


and  if 


det 


then  E(0)  = 


fll<n)  f12^n)  *•'  fl,k+l^ 


fkl(n)  fk2(n) 


0  +  B  (n,0)  ,  where 

u 


fk,k+l(n) 


=  i  M, 


det 


BG(n,0)  = 


fll(n)  f12(n) 


fkl(n^  fk2(n) 


“k+1 

fl,k+l(n) 


fk,k+l(n> 


and  B  =  J  f  (n)b  (0)  ,  j=l,2 . 

3  i=k+l  1 


k+1  . 


/s  **  \ 

Corollary  4.3:  If  E(0.)  =  0  +  £  f  (n)b  (0)  ,  j=l,2, . . . ,k+l  ,  \ 

3  i=1 

then  E(0)  =>  9  . 


For  a  proof  of  the  above  results,  we  refer  the  reader  to  Schucany, 
Gray,  and  Owen  (1971)  or  Gray  and  Schucany  (1972).  Miller  (1974,  1978) 
has  given  an  up  to  date  summary  of  the  jackknife  and  its  various  applica¬ 
tions. 


It  is  apparent  from  Theorem  4.1  and  Corollary  4.3  that  generalized 
jackknifing  is  a  way  of  bias  reduction.  It  is  with  this  thought  in  mind 
that  we  consider  combinations  of  estimators  h(n,Xg)  based  upon  different 

kernels  in  Aj  (using  the  generalized  jackknife)  to  arrive  at  estimators 
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of  h(Xg)  which  have  a  smaller  bias.  It  turns  out,  as  we  shall  soon 

see,  that  the  generalized  jackknifed  estimator  of  h(x^)  is  precisely 

that  which  we  would  have  obtained  by  considering  a  kernel  in  A  , 

in 

m >  3  .  Thus  jackknifing  kernel  estimates  of  the  failure  rate  based  upon 
kernels  in  will  produce  estimators  which  have  faster  rates  of  con¬ 

vergence  of  the  bias  and  the  mean  square  error,  but  which  by  virtue  of 
the  fact  that  they  could  also  have  been  produced  by  kernels  in  , 

m  >  3  ,  may  be  negative. 


Let  us  consider  two  estimators  of  h(xg)  based  on  kernels 
and  ,  where  and  K2  belong  to  A 2  (not  necessarily  to  A^  , 

3) ;  thus  we  have 


hi(n’xo)  -  b^n)  ^  n_J+i 


i=l,2  . 


(4.1) 


The  generalized  jackknife  estimator  h(n,Xg)  of  h^  and  h2  is 

hj(n,xQ)  -  Rh2(n,xQ) 
h(n,x0) - r -  , 

where  R  ^  1  is  a  constant  to  be  determined.  If  the  estimators  h^ 

and  h_  are  such  that  the  conditions  of  Theorem  3.1  hold  for  m~2t>6  , 
i 


then  from  (3.3) 
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Ex[h(n,xQ)  -  h(xQ)] 


h1(n,x0)-Rh2(n,x0) 

E  1 - rni - h<x. 


0>] 


=  l^R  iE[hl(n’x0)_  h(x0)]  ■  REIh2(n»x0)_  h(x0)]^ 


1 

1-R 


„  ,(2k).,  . 

^  ^  )  Oj, 

I  --7o,.VV- ~  bf(n)I(K,,2k) 


k=l 


(2k) 


t  h<2k)(x  ) 

-R  l  — b,  (n)I(K„,2k)( 


k=l 


(2k)! 


+  o 


»(b2t(n))  +  o(b2t(n)) 

=  ~r  I  fb2k(n)I(K  ,2k)  -  Rb2k(n)I(K2,2k)l 
k*=l  *•  * 

+  o(b2t(n))  +  o^b2t(n)) 


h(2k)(x0) 

(2k)! 

(4.3) 


where  I(K,q)  =  /xqK(x)dx  .  If  we  set 

b?(n)I(K  ,2) 

R=  1 - i — —  ,  (4.4) 

b2(n)I(K2,2) 

then  b2(r.)I(K  ,2)  -  Rbi^n)  I(K2, 2)  =  0  ,  and  the  leading  bias  term  of 
h(n,XQ)  ,  that  is,  the  term  containing  h"(xQ)  in  (4.3),  is  eliminated. 
The  estimator  h(n,x^)  now  becomes 
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h(n,x0) 


1  c  1  R  fX(i)~X0\  _  bl<n)I(Kl>2)  1  c  1  / 

’l(n)  n_j+1  A  bl(n)  /  b2(n)I(K  ,2)  b2(n)  j=l  n_j+1  2\ 


bl(n)  I(K2  >2) 
b2(n)I(K2r2) 


/X(j)~xp\  bi(n)  I(Ki,2)  /x(j)~xo\ 

J  »  x  A  bl<">  )  "  b2(„)  I(K2>2)  2{  b2<”>  / 


bi<n)  ^  n-j+1 


b1(n)  1(^,2) 


_  1  y  1  -/X(j)~Xo\ 

"  bl(n)  A  n“j+1  V  Vn>  /  ’ 


where 


o  I(K.,2) 

^(u)  -  c  (u)  Y(k — jT)"  K2(c(n)u) 


1  -  c  (n) 


I(K1#2) 

I(K2,2) 


b2(n) 


(4.5) 


(4.6) 


c(n)  = 


b1(n) 

bJUT  * 


can  write  K(u)  e  aK^(u)  -  3K2(c(n)u)  ,  where 


c3(n) 


I(Klf2) 


2  I<K,,2)  ’ 

1  "  C  (n>  I(K2,2) 


and  3 


1  -  c~(n) 


I(K2,2) 

104,2) 


I(K2,2) 


(4.7) 
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Thus,  the  generalized  jackknife  which  enables  us  to  combine 
estimators  h^Cn.x^)  based  upon  kernels  ,  i=l,2  ,  which  belong  to 

A„  (and  not  necessarily  to  A  ,  m>3),  and  by  choosing  R  according 
l  m  — 

to  (4.4),  we  have  produced  an  estimator  h(n,x^)  based  upon  a  kernel  K 


We  claim  that  K e  A^  .  To  see  this,  we  first  note  that 
I(K,0)  =  /K(u)du  =  a/K^(u)du  -  3  /  K2(e(n)u)du 


=  a  -  ~~r  =  1  . 

c(n) 

We  can  now  easily  verify  that  K  satisfies  (2.4).  To  see  that  K 
satisfies  (3.9)  for  m=4  ,  it  suffices  to  show  that  I(K,1)  =  I(K,2)  = 
1(K,3)  =  0  .  Since  K  and  are  even  functions,  I(K,1)  =  1(K,3)  = 

0  .  Now , 

I(K,2)  =  /  u^K(u)du  =  /au2K^(u)du  -  /  3u2K2  (c  (n)u)  du 


=  aI(K  ,2)  -  I(K2,2)  =  0  . 

c  (n) 


The  optimal  MSE  of  h(n,x^)  with  kernel  K  is  given  by  (3.14), 
with  nt  =  4  ,  as 

MSE[h(n,x0))opt 


^]8/9  [ 


8/9  r  .  12/9 

|-/u  h(4)(xp)j  n"8/9 


Note  that  by  choosing  R  according  to  (4.4),  we  have  eliminated 
the  leading  bias  term  of  (4.3).  We  are  still  at  liberty  to  choose  b^(n) 

and  b2(n)  in  any  manner,  provided  R  f  1  .  Clearly,  by  choosing  b^(n) 

and  b2(n)  such  a  ">anner  that 

bJ(n)I(K1,4)  -  Rb2(n)I(K2,4)  =  0  , 
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that  is 


b^(n)  I(K2,4)I(Kl,2) 

bfw  '  KKj.^kkPT  • 


(4.8) 


we  can  eliminate  the  second  bias  term  in  (4.3),  that  is,  the  term  con¬ 
taining  h^(xQ)  . 

When  (4.4)  and  (4.8)  are  used,  then  the  kernel  K  given  in  (4.6) 
belongs  to  ,  since 

I(K,4)  =  /u^K(u)du  =  ot  /  u^Kj,  (u)du  -  8  /  (c(n)u)du 


-  oI(K1>4)  -  -|E—  I(K2,4) 
c  (n) 

I(K,,2) 

I(K  ,4)  -  TTj—r  c  ( n)  -y—  I(K  ,4) 

1  I(K2,2) _ c">(n)  2 

1  ~  I(K2,2)  c  (n) 

I(K  ,2)  I(K,,4)I(K  ,2) 

I(K1’4)  “  X(K2,2)  10(2,4)10^,7)  I(K2*4> 
KK.,2)  ~ 

1  "  I(K2,2)  C  (n) 


=  0  . 


When  KeA^  ,  the  asymptotic  MSE  of  h(n,Xg)  is  by  (3.12), 


MSE[h(n,x  .)  ]  ~  — ~ \  /K2(u)du  +  h^(xn)  /u6K(u)du 

°  ttb(n)  1_F(X0)  1  61  ° 


Given  h  and  KeA,  ,  we  can,  for  a  fixed  value  of  n  ,  find 
t) 

that  b(n)  which  minimizes  the  asymptotic  MSE  of  h(n,Xg)  .  This 


value  is 
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h(x  )  _2 

pn^y'K  (u>,iu _ 

..6[h<6>(V 


-1/13 


The  optimal  value  of  the  MSE  for  this  choice  of  b(n)  is 


MSE[U(n,x0)]opt 


r  cm  ryi(u)du  h(Xo>  ii2/n  \J 

~ (2-6+1)  ^ 2*6  l-F(x0)J  L 


/u  K(u)du  (6), 

6!  "  'XC 


-12/13 


-  -12/13 

Since  MSE(h(n,x.) ]  ..  -*■  0  as  n  ,  the  estimator  h(n,x„) 

0  op  t  u 

has  a  faster  rate  of  convergence  of  the  MSE  than  the  original  estimat- 

-4/5 

ors  h^Cn.Xg)  and  h2(n,Xg)  ,  whose  MSE's  0  as  n 


Thus  we  have  seen  how  we  can  form  linear  combinations  of  two 
estimators,  h^  and  h2  ,  based  upon  kernels  in  A2  to  obtain  a  new 

estimator  h  with  kernel  K C  .  The  new  estimator  has  a  smaller 
bias  than  the  estimators  h^  and  h^  ,  and  has  a  faster  rate  of  con¬ 
vergence  of  the  MSE. 

We  close  this  section  by  noting  that  the  higher  order  jackknife 
may  be  used  to  obtain  linear  combinations  of  several  kernel  estimators, 
if  this  is  desired. 


4 . 1  Examples . 

We  shall  illustrate  the  material  discussed  in  this  section  by 
considering  some  specific  kernels  in  A2  ,  and  illustrate  the  nature  of 

the  new  kernels  (in  or  A^)  which  are  obtained  by  jackknifing  and 

an  appropriate  choice  of  other  constants. 
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Example  4.1:  Consider  two  estimators  of  h(xg)  ,  h^(n,Xg)  , 
i**l,2  ,  based  on  the  uniform  kernel  and  the  triangular  kernel  A^  , 

respectively.  That  is, 


Now 


=  /x2U1(x)dx  =  > 

3a 

“  /x^U^xJdx  =  — —  , 

5a 

I(A^,2)  =  /x2Aj(x)dx  =  — ~  >  and 

ba 

I(A^,4)  ■  /x^Aj(x)dx  **  — ^2  . 

150^ 

If  we  choose  b^(n)  ,  b2(n)  ,  and  R  in  such  a  manner  that 

2  b2(n)  I(A  ,4)I(U  ,2)  , 

c  (n)  *  ■  .  a  - - - i - a 

b^(„)  I<“1'4>1«1.2)  3 

and 

bJcniKU^)  4 

R  “  j  '  7  > 

b*(n)mit2)  J 


then  the  new  kernel  ,  given  by  (4.6),  is 


V\ 
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Example  4.2:  Consider  two  estimators  of  h(x^)  ,  h^(n,Xg)  , 
i*l,2  ,  based  on  two  uniform  kernels  and  K2  having  different 


bandwidths.  That  is,  for  f  a 2 


K±(x) 


2a7  ’  lx^ai 


|x|  >  a±  ,  i-1,2  . 


IK1 

i/2c»i 

i 

1 

1 

1 

i 

iK 

‘“I 

“T 

4k„ 


.l/2a. 


-a„ 


a. 


Now,  I(Ki,2>  =  /u  K^(u)du  *  a±/3  ,  and  1(^,4)  =  /u  K±(u)du  =  a^/5  , 
1*1,2  .  If  we  choose,  following  (4.4), 

bj(n)  KKj.2)  b*(n)a* 

*  ‘  ^T>  i<V» '  * 1  ’ 

then  the  new  kernel,  given  by  (4.6),  will  belong  to  A^  . 

Furthermore,  if  in  addition  to  the  above,  we  attempt  to  choose 

b*(n)  I(K2,4)I(K1,2)  «2 

-  c  (n) 


b2(n) 


I(K1>4)I(K2,2)  a2  * 


then 


.2.  .  2 

b.(n)  a. 

R  -  -5 - 7  “  1  * 

b2(n)  o“ 


and  thus  our  attempt  to  obtain  a  kernel  In  A^  fails. 


V\ 


Suppose  that  we  let 


the 


Using  this  kernel,  the  optimal  MSE  of  the  estimator  converges  to  zero 
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5.  Indefinite  Jackknifing  and  Faster 
Rates  of  Convergence 


Let  h^(n,Xg)  be  an  estimator  of  h(Xg)  based  on  a  kernel 

-4/5 

in  •  For  this  estimator  we  know  that  MSK[h^(n,Xg) ]  t  =  0(n  )  . 

00 

Assume  that  heC  ,  and  for  some  positive  real  number  c  J  1  ,  let 


K2(x)  = 


K^x)  -  c  ^(cx) 
1  -  c2 


(5.1) 


Note  that  the  general  form  of  K2  is  analogous  to  the  kernel  K  ,  given 
by  (4.6).  It  can  be  verified  that  K2 e  ,  since 

/K2(x)dx  =  — ^"2  |/K^(x)dx  -  c^/K^(cx)dxl  =  — ^  |1  -  c2/K^(u)dul  =  1  > 
1-c  L  -*  1-c  L  -* 

and 

/x2 K2(x)dx  =  — ^/x2K^(x)dx  -  c"Vx2K^(cx)dxJ 

*  — ^~2  £/x2K^(x)dx  -  /u2K^(u)duJ  =  0  . 

If  h^fn.Xg)  is  an  estimator  of  h(x^)  based  on  the  new  kernel 

—8/9 

K2  ,  th£n  from  (3.14)  we  have  MSE[h2(n,Xg) ]  «  P(n  )  .  Suppose 


that  now  we  set 


K3(x)  = 


K2(x)  -  c  K2(cx) 
1-c4 


(5.2) 


then  e  Ag  .  If  h3(n,XQ)  is  an  estimator  of  Ii(xq)  based  on  , 

-12/13 

then  MSE[h3(n,XQ) ]  t  ■  0(n  )  ;  that  is,  h3(n,Xg)  has  a  faster 

rate  of  convergence  of  the  MSE  and  bias  than  both  h2(n,Xg)  and 

**l(n,x())  ’ 


_ ~  .a 


T 


If  we  continue  in  this  manner  obtaining  a  kernel  G  A2(k-1 


k  >  2  ,  and  letting 


V*> 


Kk_i(x)  -  c2k"1Kk_1(cx) 


1  -  c 


2(k-l) 


then  KjteA2jc  ■  If  hj^n.Xg)  is  an  estimator  of  h(xg)  based  on  the 

2(2k) 

kernel  ,  then  we  note  that  MSE[h^(n,XQ)  ]q  =  0(n  .  The 

estimator  h^n.Xg)  has  a  faster  rate  of  convergence  of  the  MSE  than 
the  estimators  h^_^  (n,Xg)  , . .  •  .h^n.Xg)  . 

If  this  procedure  is  continued  indefinitely,  then  the  rate  of 

convergence  of  the  MSE  can  be  brought  as  close  to  n  *  as  is  possible 
That  is. 


lim  MSE[hk(n,x0)]  =  0(n  )  . 

lr-WX)  * 


Example  5.1:  Suppose  that  we  start  off  with  the  uniform  kernel 

(1/2  >  I*1<1 

U  r.  A  ,  II  (x)  =  <  ,  and  form  a  new  kernel  U„  e  A,  usin 

1  (0  ,  |x|  >1 

(5.1).  We  shall  consider  the  following  three  values  of  c  : 


(a)  c  =  .9;  we  have 


U, (x)  -  c  U, (cx) 


elsewhere. 
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The  asymptotic  MSE  of  h 


(a) 


MSE[h<a)(n,x0)]  ~  (1.8334)  --L 


and  the  optimal  MSE  is 


(n  x  )  =  — i —  ?  _J_  „(»)  (XW7*?\ 

Cn'V  b2(n)  ^  n-j+l  u2  ^  b2(n)  j 

h(x0)  r  b*<n)  (4)  V 

i^OO  +  L(-*2A60)-4!-h  (X0>J  * 


is 

2 


MSEti.'^o.W  ~  <-8779>  [rar^y]8/9Ch<4>M2/9  ”'8/9 


(b)  c  =  .5;  we  have 


ui  (*)  "  c  u-i(cx) 

U^b)(x)  =  -1 - ^ - 


1  -  c 

.5833  ,  |x|<l 

-.0833  ,  1  <  x_<  2  or 

0  ,  elsewhere. 

The  asymptotic  «SE  of  h<W<n,V  -  ^  j,  ^ 


is 


MSE[h<b>(n,x0)]  ~  (.3472)  ^ 


h(x0)  f"  b2(n)  "| 

i^+L(-*8000)-Vh  M  * 


and  the  optimal  MSE  is 


MSE[h^b)(n,x0)]opt  ~  (.2600) 


(c)  c  =  .1;  we  have 


f  h<*0>  ]8/9 

L1"F(X0)J 


[h(4)(x0)]Z/9  n 


2/9  -8/9 


(c)  U(x)-cJU(cx) 
Ujc,(x)  -  ^ - 


1  -  c 


.5045 

-.0005 

0 


1*1  <1 

l<x<10  or  -10  <  x  <  -1 
elsewhere. 


_ — V  s 
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The  asymptotic  MSE  of  h*  /(u,xr.)  =  — 

2.  U  b 


/r\  * 
MSE(h< 


-  — ?  -J—  „<■=>  is 

b2(n)  n-j+1  2  ^  b2(n)  J 

h(x_)  f  b*(n)  ...  I2 

0^+  |_(.20.000)  —  h(4>(x0)J  , 


and  the  optimal  MSE  is 


MSE[h^c;(n,x0)]opt  ~  (.7470) 


|~  h(xQ)  ]8/9r  (A)  12/9  - 

[l-F(x0)J  Lh  <Vj  n 


Example  5.2:  If  we  jackknife  using  the  kernel  in  Example  5.1, 
case  (b),  one  more  time  using  c=  .5  in  (5.2),  we  obtain 


The  kernel  e  A^  ,  and  the  optimal  MSE  of  h^(n,XQ>  has  a  rate  of 
convergence  of 

MSE[h3(n,x0)]opt  =  o(n~12/13)  . 


-  v«r  V\ 


we  let  c“  .5  In  (5.2)  and  obtain 


A^(x) 


A2(x)  -  c  A2(cx) 


1-  c 


1.2056  -  1. 3125  jx| 
-.2167  +  . 1097  lx | 

.0056  -  .0014 |x | 


0 


I  I  rrtt 

l<x£2  or  -  2  <x  <  - 1 
2<x<4  or  -4  <  x  <  -2 
elsewhere. 


The  rate  of  the  optimal  MSE  of  the  estimator  h^(n,XQ)  based  on  kernel 

.  *  -12/13 

A^  is  of  order  n 

In  the  following  final  example,  we  wish  to  estimate  an  exponen¬ 
tial  failure  rate  function  at  point  x^ = 1  .  To  compare  estimates  using 

different  kernels,  the  rates  of  convergence  of  the  MSB’s  are  computed. 

We  note  that  the  rate  of  convergence  of  the  MSE  is  actually  improved  if 
the  kernels  used  in  estimation  are  not  restricted  to  be  nonnegative, 
although  the  resulting  theoretical  gain  in  efficiency  may  not  be 
realized  unless  the  sample  is  very  large. 

Example  5.4:  Let  the  failure  rate  function  be  exponential  and 
given  by 

h(x)  =  ex  ,  0_<  x  <  °°  . 

That  is,  for  this  form  of  h(x)  ,  the  sample  (X^,...,Xn)  is  from  an 
extreme  value  distribution, 

F(x)  =  1  -  exp[-exp(x)]  ,  -°°  <  x  <  °°  . 


Thus,  for  a  fixed  value  of  n  and  for  different  kernels  used  in  the 
estimation,  we  have  the  following  result,  as  given  in  Table  5.1. 
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